Variant Discovery    ◾    139

-R ${ref} \

--genomicsdb-workspace-path ../gvcf21db \

--batch-size 50 \

--sample-name-map ../cohort.sample_map \

-L chr21 \

--tmp-dir ../tmp \

--reader-threads 4

cd ..

mkdir vcf

ref=$(ls refgenome/*.fasta)

~/software/gatk-4.2.3.0/gatk \

--java-options -Xmx10g \

GenotypeGVCFs \

-R ${ref} \

-V gendb://gvcf21db \

-O vcf/allsamples_chr21.vcf

4.2.2.2.12  Variant separation

The output of the previous step is a single VCF file containing both the identified SNPs and

InDels. It is good practice to separate each kind of variants in a VCF file because each type

of variants may be used for certain analysis. The “SelectVariants” GATK tool is used to

select a subset of variants based on a specific criterion. We can use “-select-type” option to

select a specific variant type. The valid types are INDEL, SNP, MIXED, MNP, SYMBOLIC,

and NO_VARIATION. The following script stores SNPs and InDels in separate CVF files:

#SNPS

~/software/gatk-4.2.3.0/gatk \

--java-options \

-Xmx10g SelectVariants \

-V vcf/allsamples_chr21.vcf \

-select-type SNP \

-O vcf/allsamplesSNP_chr21.vcf

#INDEL

~/software/gatk-4.2.3.0/gatk \

--java-options \

-Xmx4g SelectVariants \

-V vcf/allsamples_chr21.vcf \

-select-type INDEL \

-O vcf/allsamplesIndels_chr21.vcf

4.2.2.2.13  Variant filtering

Rather than the traditional variant filtering discussed with the previous variant callers,

GATK4 uses an advanced filtering approach called Variant Quality Score Recalibration

(VQSR) to filter variants called in the previous step. This approach is similar to the BQSR dis-

cussed above as it uses machine learning to model a variant profile in a high-quality training